Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node properties: support for leaf nodes #475

Merged
merged 5 commits into from
Jul 6, 2023

Conversation

ola-rozenfeld
Copy link
Contributor

Progress on #460 (95% of the fix).

In this PR, only support node properties for leaf nodes (files and empty directories); the reason is that to do it for all nodes will require me to change a current invariant / output in tree.go, and I'd like to split it to a separate PR, as it is a bit more invasive change. Also, in practice support for non-leaf nodes is much less urgent, because the main goal here is to support Bazel's use case, which is to mark files as "bazel_tool_input" in order to implement persistent remote workers.

go/api/command/BUILD.bazel Show resolved Hide resolved
go/pkg/command/command.go Show resolved Hide resolved
} else {
inputRoot = filepath.Join(actionRoot, "input")
npPath := filepath.Join(actionRoot, "input_node_properties.textproto")
if _, err := os.Stat(npPath); err == nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to rewrite this with early return, but failed -- only way I saw to return early is to duplicate one of the if clauses, which was, imo, worse.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if-else block seems to be doing a "set inputRoot and nodeProperties based on the value of actionRoot". That sounds like a good candidate for a helper function that would facilitate using less indentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thank you, good idea! I did one better, I think -- I split the input root handling and the node properties, because they have nothing to do with each other. PTAL.

Comment on lines +605 to +606
// In v2.1 of the RE API the `output_{files, directories}` fields were
// replaced by a single field: `output_paths`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the proper way of supporting v2.1 is to follow the same deprecation semantics by adding an OutputPaths field, removing the deprecated ones, and handling the precedence during this conversion here as defined in the API.

I'm concerned about unverified use-cases that might break by nilling OutputDirs and assuming all paths fit into OutputFiles.

WDYT?

Copy link
Contributor Author

@ola-rozenfeld ola-rozenfeld Jul 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem is, that deprecation never really worked out. Bazel still uses both fields (I see some recent action trying to move towards using one, but no resolution). I suspect Google's RBE still uses files/dirs. So we need to support both fields as much as possible in this client until version 3.0 actually removes the deprecated fields (and not sure if that'll ever happen...).

Here is what I think we should do:

  • when composing an entirely new RE-API proto, we should decide which fields to use based on server capabilities (which we already do AFAICT).
  • when parsing an RE-API proto, we need to work correctly with all versions: if output_paths is set, it should supercede output_files/dirs. Which is what I implement here.
  • the composing and parsing should be orthogonal -- the tool should support reading protos in one format and outputting them in another (e.g. action was downloaded from a different RE than it is executed on, or the RE migrates to a different API version)

The previous implementation didn't support output_paths at all, which meant this tool was not working with servers supporting 2.1, and here I fix that. AFAICT we can indeed safely put all outputs in output_files, because we shouldn't care what they really are; if there is any code that will be broken by this assumption, we need to find out ASAP and fix it, because we actually can't know the type of the output. But I believe that's extremely unlikely. Basically, the only reason these exist as split fields in Command to begin with is because they were split in the RE-API and we needed to support that. All we do is pass them through to RE. We do no semantic checks on them.

Adding an OutputPaths field in Command is possible, but imo it will only add complexity, as we'll still need to populate the RE proto based on all fields, and maintain the fields in sync. I think just using the output files field is simpler. Of course, if someone tries to run a saved 2.1 command with an output directory on a 2.0 RE, this will not work. without them manually modifying the saved proto, but that's true regardless of what field we use.

Still TODO, btw:

  • make the fake server support both RE-API versions (via an option)
  • use parameterized tests to test both, to make sure that we support all combinations (can run old commands on new servers, and vice versa).
  • improve the tool to be more explicit when the action, command and input root digests actually change from what has been requested / found on disk. The tool currently silently recomputes a lot of digests, which can be misleading and cover up errors such as API version mismatches leading to different digests.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the well laid-out comment.

My concern was specifically around losing the information about which paths are directories. As you mentioned, this is only a problem when passing a command from 2.1+ server to a 2.0- one, in which case the information (directories) wasn't or shouldn't be there in the first place.

go/pkg/tool/tool.go Outdated Show resolved Hide resolved
} else {
inputRoot = filepath.Join(actionRoot, "input")
npPath := filepath.Join(actionRoot, "input_node_properties.textproto")
if _, err := os.Stat(npPath); err == nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if-else block seems to be doing a "set inputRoot and nodeProperties based on the value of actionRoot". That sounds like a good candidate for a helper function that would facilitate using less indentation.

for _, ev := range cmdPb.EnvironmentVariables {
cmd.InputSpec.EnvironmentVariables[ev.Name] = ev.Value
func readNodePropertiesFromFile(path string) (nps map[string]*repb.NodeProperties, err error) {
if _, err = os.Stat(path); err == nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rearrange the conditions such that the terminal one appears first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, much better now, thank you! PTAL.

Comment on lines +605 to +606
// In v2.1 of the RE API the `output_{files, directories}` fields were
// replaced by a single field: `output_paths`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the well laid-out comment.

My concern was specifically around losing the information about which paths are directories. As you mentioned, this is only a problem when passing a command from 2.1+ server to a 2.0- one, in which case the information (directories) wasn't or shouldn't be there in the first place.

@mrahs mrahs merged commit 5700902 into bazelbuild:master Jul 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants